Fine-Grain Multitolerant Barrier Synchronization
نویسندگان
چکیده
We design a multitolerant program for synchronizing the phases of concurrent processes. The tolerances of the program enable processes to (i) compute all phases correctly in the presence of faults that corrupt process state in a detectable manner, and (ii) compute only a minimum possible number of phases incorrectly before resuming correct computation in the presence of faults that corrupt process state in an undetectable manner. The program is ne-grain in the sense that each process action either updates the state of that process or involves communication with one of two neighboring processes. 1 Motivation Barrier synchronization in its general form requires that a set of processes execute a cyclic sequence of phases so that a phase is executed by each process only after all processes have completed the previous phase. This form of synchronization generalizes a variety of others, such as clock unison, phase synchronization and atomic commitment, and appears frequently in parallel, distributed, and scientiic computation applications. Often the design of barrier synchronization has to accommodate the occurrence of faults. Commonly considered examples of faults include incorrect initializations; corruption, loss, reordering, and duplication of messages; processor restarts; and performance and timing violations. While some of these fault-classes can be masked (i.e., even in their presence each phase is executed correctly), others cannot. To accommodate multiple fault-classes, not all of which can be masked, it is convenient to view the eeect of faults in each fault-class as a \corruption" of the state of some process, i.e., the state of that process prior to the fault is lost and is replaced with some other value. The state corruption view suggests that one way to accommodate all of the fault-classes is to design the set of processes to be stabilizing 2], i.e., to recover from an arbitrarily corrupted state to one from where the speciication of barrier synchronization is (re)satissed. Unfortunately, a stabilizing design allows incorrect execution of (a nite number of) phases in the presence of each fault-class before the recovery is complete, and it is thus not ideal for the fault-classes that can be masked. We therefore present in this paper barrier synchronization designs that ooer multiple levels of tolerance corresponding to multiple fault-classes, a notion which we refer to as multitolerance 3].
منابع مشابه
A Fine-Grain Parallel Architecture Based on Barrier Synchronization
Although barrier synchronization has long been considered a useful construct for parallel programming, it has generally been either layered on top of a communication system or used as a completely independent mechanism. Instead, we propose that all communication be made a side-effect of barrier synchronization. This is done by extending the barrier synchronization unit to collect a datum from e...
متن کاملFilaments: Efficient Support for Fine-Grain Parallelism
It has long been thought that coarse-grain parallelism is much more efficient than fine-grain parallelism due to the overhead of process (thread) creation, context switching, and synchronization. On the other hand, there are several advantages to fine-grain parallelism: architecture independence, ease of programming, ease of use as a target for code generation, and load-balancing potential. Thi...
متن کاملDEPARTMENT OF COMPUTER SCIENCE Filaments: Efficient Support for Fine-Grain Parallelism
It has long been thought that coarse-grain parallelism is much more efficient than fine-grain parallelism due to the overhead of process (thread) creation, context switching, and synchronization. On the other hand, there are several advantages to fine-grain parallelism: architecture independence, ease of programming, ease of use as a target for code generation, and load-balancing potential. Thi...
متن کاملMultitolerant Barrier Synchronization
We design a multitolerant program for synchronizing the phases of concurrent processes. The tolerances of the program enable processes to (i) execute all phases correctly in the presence of faults that corrupt process state in a detectable manner, and (ii) execute only a minimum possible number of phases incorrectly before resuming correct computation in the presence of faults that corrupt proc...
متن کاملThe Elephant and the Mouse: Non-Strict Fine-Grain Synchronization for Many-Core Architectures
A new synchronization mechanism created under the dataflow model of computation was introduced during the late 1970s and called I-Structure. I-Structure exhibited the following important features: (1) it is a dataflow style synchronization, i.e., synchronization only occurs between an I-Structure producer and consumer operations that are accessing the same memory location; (2) it is fine-grain ...
متن کامل